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Abstract:  Discretization  algorithms  for  solving  semi-infinite  minimax  problems  replace  the  original  prob¬ 
lem  by  an  approximation  involving  the  maximization  over  a  finite  number  of  functions  and  then  solve  the 
resulting  approximate  problem.  The  approximate  problem  gives  rise  to  a  discretization  error  and  its  solution 
results  in  an  optimization  error  as  a  minimizer  of  that  problem  is  rarely  achievable  with  a  finite  computing 
budget.  Accounting  for  both  discretization  and  optimization  errors,  we  determine  the  rate  of  convergence 
of  discretization  algorithms  as  a  computing  budget  tends  to  infinity.  We  find  that  the  rate  of  convergence 
depends  on  the  class  of  optimization  algorithms  used  to  solve  the  approximate  problem  as  well  as  the  policy 
for  selecting  discretization  level  and  number  of  optimization  iterations.  We  construct  optimal  policies  that 
achieve  the  best  possible  rate  of  convergence  and  find  that  under  certain  circumstances  the  better  rate  is 
obtained  by  inexpensive  gradient  methods. 
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vergence,  exponential  smoothing  technique. 


1  Introduction 

In  many  applications  such  as  investment  portfolio  allocation,  engineering  design,  and  policy 
optimization,  decision  makers  need  to  determine  a  best  course  of  action  in  the  presence  of 
uncertain  parameters.  One  possibility  for  handling  these  situations  is  to  formulate  and  solve 
“robust”  optimization  models  where  the  optimal  decision  is  determined  in  view  of  worst-case 
parameter  values.  We  refer  to  [1-3]  for  an  overview  of  recent  developments.  In  this  paper, 
we  consider  robust  optimization  models  in  the  form  of  the  semi-infinite  minimax  problem 

(P)  min 

x&X 
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where  X  =  or  a  simple  closed  convex  subset  of  (for  example  a  polyhedron),  :  Wl  — >  ffi. 
is  dehned  by 

ip(x)=  max.(f)(x,y),  (1) 

Y  is  a  compact  subset  of  Mm,  </>  :  x  — >  M,  and  </>(•,  •)  is  as  smooth  as  required  by  the 

applied  algorithm  in  its  first  argument  and  Lipschitz  continuous  in  its  second  argument.  We 
refer  to  m  as  the  uncertainty  dimension. 

There  are  numerous  algorithms  for  solving  (P)  such  as  exchange  algorithms,  local  reduc¬ 
tion  algorithms,  smoothing  methods,  and  discretization  algorithms;  see  for  example  [4-7], 
[8,  Chapter  3],  and  [1,  Chapter  2],  When  is  convex,  bundle  and  (sub)gradient  meth¬ 
ods  also  apply  [9,  10].  Discretization  algorithms  is  an  attractive  class  of  algorithms  due  to 
their  simplicity,  sound  theory,  and  the  need  for  few  assumptions.  These  algorithms  con¬ 
struct  an  approximation  of  (P)  by  replacing  Y  by  a  subset  of  finite  cardinality  and  then 
(approximately)  solving  the  resulting  finite  minimax  problem  using  a  suitable  optimization 
algorithm.  Since  the  maximization  over  y  G  Y  is  replaced  by  maximization  of  a  finite  number 
of  scalars,  restrictive  assumptions  such  as  concavity  of  •)  for  all  x  G  X  and  convexity 
of  Y  are  avoided.  Of  course,  if  the  uncertainty  dimension  is  high,  discretization  may  be 
impractical.  Discretization  algorithms  are  therefore  mainly  applied  to  problem  instances 
with  small  uncertainty  dimensions  as  often  encountered  in  engineering  design,  where  the 
uncertain  parameter(s)  may  be  time,  frequency,  and/or  temperature;  see  for  example  [11] 
and  references  therein.  Under  the  assumption  that  4>(-,y)  is  (twice)  continuously  differen¬ 
tiable  for  all  y  G  Y  and  A"  is  simple,  the  finite  minimax  problem  can  be  solved  by  standard 
nonlinear  programming  algorithms  as  well  as  specialized  algorithms;  see  for  example  [11,  12], 
Some  discretization  algorithms  involve  constructing  and  solving  a  sequence  of  finite  minimax 
problems  with  increasing  level  of  discretization  (see  for  instance  [8,  Section  3.4]),  but  in  this 
paper  we  focus  on  algorithms  based  on  the  solution  of  a  single  finite  minimax  problem. 

It  is  well-known  that  given  a  suitable  discretization  of  Y  and  relatively  mild  assump¬ 
tions,  global  and  local  minimizers  as  well  as  stationary  points  of  the  finite  minimax  problem 
converge  to  corresponding  points  of  (P),  as  the  level  of  discretization  grows  to  infinity;  see 
for  example  [8,  Chapter  3]  and  [13].  The  rate  of  convergence  of  global  minimizers  is  of  order 
0(PnP)->  where  p w  is  the  meshsize  of  a  discretization  of  Y  using  N  discretization  points  and 
p  is  a  growth  parameter  [13];  see  also  [14],  The  rate  is  improved  under  additional  assump¬ 
tions  on  the  set  of  maximizers  in  (1)  at  an  optimal  solution  of  (P)  [13].  The  importance  of 
including  boundary  points  of  Y  in  the  discretization  and  the  resulting  rate  of  convergence  as 
N  tends  to  infinity  is  discussed  in  [14],  While  these  results  provide  important  insight,  they 
do  not  consider  the  computational  work  required  to  solve  the  finite  minimax  problem. 

The  apparent  simplicity  of  discretization  algorithm  hides  a  fundamental  trade-off  be¬ 
tween  the  level  of  discretization  of  Y  and  the  computational  work  required  to  approximately 
solve  the  resulting  finite  minimax  problem.  One  would  typically  require  a  fine  discretization 
of  Y  to  guarantee  that  the  finite  minimax  problem  approximates  (P),  in  some  sense,  with 
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high  accuracy.  However,  in  that  case,  the  finite  minimax  problem  becomes  large  scale  (in  the 
number  of  functions  to  maximize  over)  and  the  computational  work  to  solve  it  may  be  high 
[11,  12],  A  coarser  discretization  saves  in  the  solution  time  of  the  correspondingly  smaller 
finite  minimax  problem  at  the  expense  of  a  poorer  approximation  of  (P).  It  is  often  difficult 
in  practice  to  construct  discretizations  of  Y  that  balances  this  trade-off  effectively. 

In  this  paper,  we  examine  the  rate  of  convergence  of  a  class  of  discretization  algorithms  as 
a  computing  budget  tends  to  infinity.  We  show  that  the  policy  for  selecting  discretization  level 
of  Y  relative  to  the  size  of  the  available  computing  budget  influences  the  rate  of  convergence 
of  discretization  algorithms.  We  identify  optimal  discretization  policies,  in  a  precisely  defined 
sense,  for  discretization  algorithms  based  on  finitely,  superlinearly,  linearly,  and  sublinearly 
convergent  optimization  algorithms  for  solving  the  resulting  finite  minimax  problems.  We 
also  construct  an  optimal  discretization  policy  for  the  case  when  the  finite  minimax  problem 
is  solved  by  an  exponential  smoothing  algorithm,  where  the  level  of  smoothing  must  be 
determined  too. 

Other  than  [13,  14],  there  are  few  studies  dealing  with  rate  of  convergence  of  discretiza¬ 
tion  algorithms.  For  a  class  of  adaptive  discretization  algorithms,  where  a  sequence  of  finite 
minimax  problems  are  solved  with  gradually  higher  and  adaptively  determined  levels  of  dis¬ 
cretization,  [15,  16]  show  that  suitable  rules  for  selecting  the  levels  of  discretization  lead 
to  a  rate  of  convergence,  as  the  number  of  iterations  tends  to  infinity,  that  is  identical  to 
the  rate  of  convergence  of  the  algorithm  used  to  solve  the  finite  minimax  problems.  Con¬ 
sequently,  loosely  speaking,  the  number  of  iterations  required  to  achieve  a  certain  tolerance 
when  solving  (P)  is  the  same  as  that  when  solving  a  finite  minimax  problem  obtained  from 
(P)  by  discretization.  The  computational  work  in  each  iteration,  however,  may  grow  rapidly 
as  successively  finer  discretization  levels,  and  consequently  larger  finite  minimax  problems, 
must  be  considered  in  the  adaptive  discretization  algorithm.  To  our  knowledge,  there  are 
no  studies  that  attempt  to  quantify  the  rate  of  convergence  of  discretization  algorithms  for 
semi-infinite  minimax  problems  in  terms  of  a  computing  budget,  accounting  for  both  the 
number  of  iterations  and  the  work  in  each  iteration. 

An  alternative  to  discretization  algorithms  is  an  approach  based  on  algorithm  imple¬ 
mentation.  Here  an  existing  optimization  algorithm,  which  when  applied  to  (P)  may  involve 
conceptual  step  such  as  finding  y*  e  argrna xyeY  4>(x,y),  is  “implemented”  by  replacing  the 
conceptual  steps  with  approximations.  The  e-subgradient  method  for  (P)  is  an  example  of 
an  algorithm  implementation  of  the  subgradient  method  under  convexity-concavity  assump¬ 
tions.  The  implementation  of  (fast)  gradient  methods  for  problem  instances  where  function 
and  gradient  evaluations  cannot  be  carried  out  exactly  is  discussed  in  [10].  That  study  iden¬ 
tifies  the  “best”  gradient  method  for  (P)  under  assumptions  about  the  computational  cost 
of  reducing  the  evaluation  error,  the  convexity  in  the  first  and  concavity  in  second  argument 
of  0(-,  •),  convexity  of  A"  and  1”,  and  the  use  of  specific  gradient  methods. 

Rate  of  convergence  analysis  in  terms  of  a  computing  budget  is  common  in  other  areas 
such  as  Monte  Carlo  simulation  and  simulation  optimization;  see  [17]  for  a  review.  In  those 
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areas,  given  a  computing  budget,  the  goal  is  to  optimally  allocate  it  across  different  task 
within  the  simulation  and  to  determine  the  resulting  rate  of  convergence  of  an  estimator 
as  the  computing  budget  tends  to  infinity.  The  allocation  may  be  between  exploration 
of  new  points  and  estimation  of  objective  function  values  at  known  points  as  in  global 
optimization  [18,  19]  and  stochastic  programming  [20,  21],  between  estimation  of  different 
random  variables  nested  by  conditioning  [22],  or  between  performance  estimation  of  different 
systems  as  in  ranking  and  selection  [23] .  Even  though  these  studies  deal  with  rather  different 
applications  than  semi-infinite  minimax  problems,  they  motivate  the  present  paper.  The 
present  paper  is  most  closely  related  to  the  recent  paper  [21],  where  the  authors  consider 
the  sample  average  approximation  approach  to  solving  stochastic  programs.  That  approach 
replaces  an  expectation  in  the  objective  function  of  the  stochastic  program  by  a  sample 
average  and  then  proceeds  by  solving  the  sample  average  problem  using  an  optimization 
algorithm.  They  consider  sublinearly,  linearly,  and  superlinearly  convergent  optimization 
algorithms  for  solving  the  sample  average  problem,  determine  optimal  policies  for  allocating 
a  computing  budget  between  sampling  and  optimization,  and  quantify  the  associated  rate  of 
convergence  of  the  sample  average  approximation  approach  as  the  computing  budget  tends 
to  infinity.  The  present  paper  has  the  same  goals,  but  in  the  context  of  semi-infinite  minimax 
problems.  Our  treatment  of  sublinear,  linear,  and  superlinear  optimization  algorithms  for 
solving  the  finite  minimax  problems  is  similar  to  the  parallel  development  in  [21],  but  is 
carried  out  with  different  assumptions.  The  conclusions  are  naturally  somewhat  different. 
We  also  deal  with  exponential  smoothing  algorithms  for  solving  the  finite  minimax  problem, 
a  topic  not  relevant  in  the  case  of  stochastic  programming. 

The  next  section  presents  the  finite  minimax  problem  corresponding  to  (P)  and  asso¬ 
ciated  assumptions.  Section  3  considers  finite,  superlinear,  linear,  and  sublinear  algorithms 
for  solving  the  finite  minimax  problem  and  determines  optimal  discretization  policies  with 
corresponding  rates  of  convergence  as  the  computing  budget  tends  to  infinity.  Section  4 
deals  with  the  solution  of  the  finite  minimax  problem  by  exponential  smoothing  algorithms, 
constructs  an  optimal  discretization  and  smoothing  policy,  and  determines  the  correspond¬ 
ing  rate  of  convergence  as  the  computing  budget  tends  to  infinity.  The  paper  ends  with 
concluding  remarks  in  Section  5. 

2  Discretization  and  Assumptions 

Discretization  algorithms  for  solving  (P)  replace  Y  by  a  finite  subset  YN  c  Y  of  cardinality 
N  gN={1,2,  3, ...}  and  approximately  solve  the  resulting  finite  minimax  problem 

(Pat)  min  fiN(x), 

xGX 

where  — )■  M  is  defined  by 

iPn(x)  =  ma  xcj>(x,y). 

V&Yn 


4 


Clearly,  when  (f(-,y)  is  smooth  for  all  y  G  Yy,  (Pn)  is  solvable  by  numerous  nonlinear 
programming  and  finite  minimax  algorithms;  see  for  example  [11,  12], 

The  relationship  between  if(-)  and  iPn(-)  depends  on  the  properties  of  cf(-,  •)  and  YN. 
We  adopt  the  following  assumption. 

Assumption  1.  We  assume  that  the  following  hold: 

(i)  The  set  of  optimal  solutions  X*  of  (P)  is  nonempty. 

(ii)  There  exists  a  constant  L  G  [0,  00)  such  that 

\(j){x,y)  -cf(x,y')\  <  L\\y-y'\\, 

for  all  x  G  X  and  y,y'  G  Y . 

(Hi)  There  exist  constants  N  G  N  and  K  G  [0,  00)  such  that  (a)  the  set  of  optimal  solutions 
Xfj  of  (Pn)  is  nonempty  for  all  N  >  N,  N  G  N,  and  (b)  for  every  N  >  N ,  N  G  N, 
and  y  G  Y ,  there  exists  a  y'  G  Y/v  with  \\y  —  y'\\  <  K/N1^171 .  □ 

Part  b  of  item  (iii)  holds,  for  example,  when  Y  is  the  unit  hypercube  in  m  dimensions 
and  the  discretization  scheme  is  uniform  across  Y,  in  which  case  N  =  2m  and  K  =  m1/2;  see 
[24],  The  next  result  is  a  simple  extension  of  Lemma  3.4.3  in  [8],  where  we  use  the  notation 
'if*  and  if*N  to  denote  the  optimal  values  of  (P)  and  (Pn),  respectively. 

Proposition  1.  Suppose  that  Assumption  1  holds.  Then, 

0  <  if(x)  -  ifN(x)  <  LK/N1/m,  (2) 

for  all  x  G  X,  N  G  N,  N  >  N,  where  L,  K,  and  N  are  as  in  Assumption  1. 

Moreover, 

0  <  if* -if*N<  LK/N1/m, 

for  all  N  eN,N  >N.  □ 

We  refer  to 

if(x)  - 

as  the  discretization  error.  In  view  of  Proposition  1,  the  discretization  error  is  of  order 
(3(_/V-1/m)  and  the  optimal  value  of  (Pn)  tends  to  that  of  (P)  at  least  at  rate  IW1/™,  as 
N  — »  00. 

Unless  X  and  <f(-,y),  y  G  Y/v,  have  special  structures,  one  cannot  expect  to  obtain  a 
globally  optimal  solution  of  (Pn)  in  finite  computing  time.  Hence,  after  a  finite  number 
of  iterations  of  an  optimization  algorithm  applied  to  (Pn),  there  is  typically  a  remaining 
optimization  error.  Specifically,  given  an  optimization  algorithm  A  for  (Pn),  let  x1^  G  X  be 
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the  iterate1  obtained  by  A  after  n  iterations  when  applied  to  (Pn)-  Then  the  optimization 
error  is  defined  as 

^n(Xn)  -  ^*N- 

The  rate  with  which  the  optimization  error  decays  as  n  grows  depends  on  the  rate  of  con¬ 
vergence  of  A  when  applied  to  (Pn)-  Here  and  throughout  the  paper,  we  only  consider 
algorithms  that  generate  iterates  in  A"  exclusively,  which  is  stated  in  the  next  assumption. 

Assumption  2.  We  assume  that  if  {a;^}?^L07  N  e  N,  are  generated  by  a  given  optimization 
algorithm  when  applied  to  (Pn),  then  x ^  €  X  for  all  N  e  N  and  n  =  0, 1,  2, ....  □ 

In  view  of  the  assumed  simplicity  of  X,  essentially  all  relevant  optimization  algorithms 
satisfy  Assumption  2.  We  also  define  the  total  error  as 

f)(xnN)  -  if*, 

which  measures  the  quality  of  the  obtained  solution  after  n  iteration  of  the  given  optimization 
algorithm  applied  to  (Pn)-  In  view  of  Assumptions  1  and  2  and  Proposition  1, 

0  <if(xnN) -if*  =  if(xnN)  -  ifN(xnN)  +  ifN(xnN)  -  if*N  -  if*  +  if*N 

<  LK/Nl'm  +  A”  (A),  (3) 

where  A %(A)  is  an  upper  bound  on  the  optimization  error  after  n  iterations  of  optimization 
algorithm  A  applied  to  (Pn)-  Below,  we  discuss  several  different  expressions  for  A1fr(A)  under 
various  assumptions  about  the  optimization  algorithm  and  effectively  also  about  (Pn)-  Since 
it  appears  difficult  to  quantify  the  rate  of  convergence  of  the  total  error,  we  focus  on  the  rate 
of  convergence  of  its  upper  bound  in  (3)  as  described  next.  The  rate  of  convergence  of  that 
bound  provides  a  guaranteed  minimum  rate  of  convergence  of  the  total  error. 

We  see  from  (3)  that  different  choices  of  N  and  n  may  result  in  different  bounds  on 
the  total  error.  Let  b  G  N  be  the  computing  budget  available  for  executing  n  iterations  of 
the  selected  optimization  algorithm  on  (Pn)-  Clearly,  the  choice  of  N  and  n  would  typically 
depend  on  b  and  we  write  Nb  and  nb  to  stress  this  dependence.  We  refer  to  {(nb,  Nb)}'^=1,  with 
nb,  lV(,eN  for  all  b  e  N,  as  a  discretization  policy.  A  discretization  policy  specifies  the  level 
of  discretization  of  Y  and  the  number  of  iterations  of  the  optimization  algorithm  to  execute 
for  any  computing  budget.  If  nb,  Nb  — >  oo,  as  b  — >  oo,  then  the  bound  on  the  discretization 
error  vanishes;  see  Proposition  1.  Assuming  a  convergent  optimization  algorithm  to  a  global 
minimizer  of  (Pn),  the  optimization  error  and,  presumably,  the  corresponding  bound  vanish 
too.  For  a  given  optimization  algorithm  A  and  n ,  N  e  N,  we  define  the  total  error  bound, 
denoted  by  e(A,N,n ),  as  the  right-hand  side  of  (3),  i.e., 

e (A,  n,  N)  =  LK/N1/m  +  AnN (A) .  (4) 

1  Iterates  may  depend  on  quantities  such  as  algorithm  parameters  and  the  initial  point  used.  In  this  paper,  we 
view  the  specification  of  such  quantities  as  part  of  the  algorithm  and  therefore  do  not  reference  them  directly. 
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In  this  paper,  we  examine  the  rate  at  which  the  total  error  bound  e(A,nb,  -ZV&)  vanishes 
as  b  tends  to  infinity  for  different  discretization  policies  {(rib,  ^b)}fL\  and  optimization  algo¬ 
rithms  A.  We  identify  optimal  discretization  policies,  which  as  precisely  stated  below  attain 
the  highest  possible  rate  of  convergence  of  the  total  error  bound  as  the  computing  budget 
tends  to  infinity  for  a  given  class  of  optimization  algorithms. 

Our  analysis  relies  on  the  following  assumption  about  the  computational  work  needed 
by  an  optimization  algorithm  to  carry  out  n  iterations  on  (Pn)- 

Assumption  3.  There  exist  constants  M  =  M(A,d )  G  (0,  oo)  and  u  =  v(A)  €  [1,  oo)  such 
that  the  computational  work  required  by  a  given  optimization  algorithm  A  to  carry  out  n  G  N 
iterations  on  (Pn)  (of  dimension  d),  N  G  N,  is  no  larger  than  nA/INu .  □ 

Assumption  3  holds  with  u  —  1  if  the  optimization  algorithm  A  is  a  subgradient  or 
smoothing  algorithm  (see  [24])  as  each  iteration  of  these  algorithms  requires  the  calculation  of 
i(>n(x)  at  the  current  iterate  x  G  X  (which  involves  finding  the  maximum  over  N  scalars)  and 
the  evaluation  of  gradients  V Xf>(x,  y)  for  one  y  EYn  in  the  subgradient  method  and  for  all 
y  G  Yn  in  a  smoothing  algorithm.  The  constant  M  may  be  of  order  0(d)  as  Xx(j)(x,y)  G 
or,  possibly,  proportional  to  another  function  of  d  depending  on  the  structure  of  X  and 
<f>(-,y),  y  G  Y.  Other  optimization  algorithms  for  ( P )  tend  to  result  in  larger  values  of  v 
and  M.  For  example,  the  sequential  quadratic  programming  (SQP)  algorithm  in  [11]  and 
the  Pshenichnyi-Pironneau-Polak  (PPP)  algorithm  [8,  Section  2.4]  for  solving  finite  minimax 
problems  require  the  solution  of  one  or  two  convex  quadratic  programs  (QPs)  with  d  +  1 
variables  and  N  linear  inequality  constraints  in  each  iteration.  A  QP  solver  based  on  an 
interior  point  method  may  need  0(d2N)  operations  per  iteration  when  N  >  d  [25].  The 
number  of  iterations  required  by  an  interior  point  method  on  such  QPs  could  be  of  order 
O(VdTN)  [26],  or  even  less  in  practice  when  using  a  good  method.  Hence,  M  may  be  of 
order  0(d2),  or  possibly  larger  depending  on  the  structure  of  X  and  <j>(-,y),  y  G  Y,  and  v 
may  be  1.5. 

We  note  that  computational  savings  have  been  observed  empirically  with  the  use  of 
active-set  strategies  when  solving  (. PN )  as  well  as  any  QP  encountered  in  the  process;  see 
[11,  12,  25,  27].  While  of  practical  importance,  in  this  paper  we  ignore  this  possibility  as  the 
effect  of  active-set  strategies  in  worst-case  rate  analysis  is  unclear. 

In  view  of  Assumption  3,  we  refer  to  a  discretization  policy  {(nb,  W)}^=1  as  asymp¬ 
totically  admissible  if  UbMNjf /b  — *  1,  as  b  — >  oo.  Clearly,  an  asymptotically  admissible 
discretization  policy  satisfies  the  computing  budget  in  the  limit  as  b  tends  to  infinity.  In  the 
next  two  sections,  we  determine  optimal  asymptotically  admissible  discretization  policies 
and  corresponding  rates  of  convergence  of  the  total  error  bound  under  different  assumptions 
about  the  optimization  algorithm  and  consequently  the  optimization  error  bound  A ^  (A). 
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3  Finite,  Superlinear,  Linear,  and  Sublinear  Algorithms 

We  see  from  (4)  that  the  total  error  bound  consists  of  discretization  and  optimization  error 
bounds.  The  discretization  error  bound  depends  on  the  discretization  level  N,  but  not  on  the 
optimization  algorithm  used;  see  Proposition  1.  The  optimization  error  bound  depends  on 
the  rate  of  convergence  of  the  optimization  algorithm  used  to  solve  (Pn)-  In  this  section,  we 
consider  four  cases:  First,  we  assume  that  the  optimization  algorithm  solves  (Pn)  hr  a  finite 
number  of  iterations.  Second,  we  consider  optimization  algorithms  with  a  superlinear  rate 
of  convergence  towards  an  optimal  solution  of  (Pn)-  Third,  we  deal  with  linearly  convergent 
optimization  algorithms.  Fourth,  we  assume  a  sublinearly  convergent  algorithm.  We  observe 
that  in  practice  an  assumption  about  the  rate  of  convergence  of  an  optimization  algorithm 
when  applied  to  (Pn)  would  indirectly  imply  certain  properties  of  (Pn)  such  as  convexity. 


3.1  Finite  Optimization  Algorithm 

Suppose  that  the  optimization  algorithm  for  solving  (Pn)  is  guaranteed  to  obtain  an  optimal 
solution  in  a  finite  number  of  iterations  independently  of  N  as  defined  precisely  next. 


Definition  1.  An  optimization  algorithm.  A  converges  finitely  on  {(Pn)}™_jj  when  X*N  is 
nonempty  for  N  >  N  and  there  exist  a  constant  n  G  N  such  that  for  all  N  >  N,  N  e  N,  a 
sequence  generated  by  A  when  applied  to  (Pn)  satisfies  xf/  G  X ^  for  all  n  >n.  □ 


No  optimization  algorithm  converges  finitely  on  {(Pn)}^_jj  without  strong  structural 
assumptions  on  X,  0(-,  •),  and  Y  such  as  linearity.  In  this  paper,  we  are  not  interested  in 
instance  of  (Pn)  in  the  form  of  linear  programs,  for  which  finite  convergence  may  be  possible, 
but  include  this  case  here  as  an  “ideal”  case.  As  we  see  below,  the  case  provides  an  upper 
bound  on  the  rate  of  convergence  of  the  total  error  bound  using  any  optimization  algorithm. 
In  view  of  Definition  1,  a  finitely  convergent  optimization  algorithm  Afimte  on  {(P/v)}^_v 
has  no  optimization  error  after  a  sufficiently  large  number  of  iterations.  Hence,  we  define 
A^r(Afimte)  =  0  and  e(Afimte,n,  N)  =  LK/N 1/m  for  n  >  n  and  N  >  N,  where  L  and  K  are 
as  in  Assumption  1  and  n  and  N  as  in  Definition  1.  Naturally,  one  can  in  this  case  let  the 
portion  of  the  computing  budget  allocated  to  discretization  tend  to  1,  as  b  — >  oo.  The  next 
theorem  states  the  rate  of  convergence  of  the  total  error  bound  in  this  case. 


Theorem  1.  Suppose  that  Assumption  1  holds  and  that  Afimte  is  a  finitely  convergent  algo¬ 
rithm  on  {(P/v)}^_jy.  with  N  as  in  Assumption  1  and  number  of  required  iterations  n  as  in 
Definition  1.  Suppose  also  that  A&mte  satisfies  Assumptions  2  and  3.  If  {(nb,  Nb)}^=1  is  an 
asymptotically  admissible  discretization  policy  with  nb  =  n  for  all  b  e  N,  then 


lim 

6— >-oo 


log  e(A{[nite,nb,Nb) 
log  b 


1 

mu  ’ 


where  v  is  as  in  Assumption  3  and  m  is  the  uncertainty  dimension. 


Proof.  Since  {(nb,  Nb)}'^=l  is  asymptotically  admissible,  nbMNb/b  =  nMNb/b  — >■  1,  as 
b  — >  oo,  and  we  have  that  Nb  — >  oo,  as  b  — >  oo.  Here  M  is  as  in  Assumption  3.  Hence,  for 
sufficiently  large  b,  A^(^4fimte)  =  0  and  e(ARmte ,  nb,  Nb)  =  LK/N^m,  where  L  and  K  are  as 
in  Assumption  1.  Consequently,  for  sufficiently  large  b, 


log  e(Afinite,  nb,  Nb) 


l/vm 


log  LK - log  b  H - log  nM  — 

urn  urn 


urn 


nM  Nf 
b 


Since  nMNjf/b  — »  1  as  b  — )■  oo,  the  conclusion  follows  after  dividing  by  log  b  and  taking 
limits.  □ 

Theorem  1  gives  the  asymptotic  rate  of  decay  of  e(Afimte,  nb,  Nb)  on  a  logarithmic  scale 
as  b  tends  to  infinity.  We  say  in  this  case  that  the  discretization  algorithm  and  its  total  error 
bound  e(ARmte,nb,  Nb)  converge  at  rate  b~1^mu\  Similar  statements  below  are  referenced 
likewise. 

For  any  discretization  policy  satisfying  nbMN[ ut  <  b  for  all  b  e  N  and  M  >  1,  Nb  <  b1^ 
for  all  b  G  N.  Hence,  in  view  of  Proposition  1,  the  optimal  value  of  (Pn)  and  the  discretization 
error  converge  at  rate  Nb  1  >  b~l^mu\  as  b  — >  oo.  Hence,  the  discretization  error  cannot 

converge  at  a  faster  rate  than  that  stipulated  in  Theorem  1.  Since  the  total  error  bound 
includes  the  discretization  error  bound  (see  (4)),  the  total  error  bound  cannot  converge 
faster  than  the  rate  regardless  of  the  optimization  algorithm  used  to  solve  (Pn). 

The  asymptotically  admissible  discretization  policy  stated  in  Theorem  1  is  problematic  to 
implement  as  n  may  be  unknown.  Still,  the  resulting  rate  is  an  upper  bound  on  the  rate 
that  can  be  obtained  by  any  optimization  algorithm  and  therefore  provides  a  benchmark  for 
comparison. 


3.2  Superlinear  Optimization  Algorithm 

We  next  consider  superlinearly  convergent  optimization  algorithms  as  defined  as  follows. 

Definition  2.  An  optimization  algorithm.  A  converges  superlinearly  with  orders  G  (1,  oo)  on 
{(Pn)}n-n  w^en  X*N  * s  nonempty  for  N  >  N  and  there  exist  constants  n  G  N,  c  G  [0,  oo)  , 
and  p  G  [0, 1)  such  that  cl^<cl^l\,f)N(xrfl)  —  if*N)  <  p  and 

^N(xnNl)  < 

(^jv(^)  -  Vn)1 

for  all  n  >  n,  n  G  N,  and  N  >  N ,  N  G  N.  □ 

Definition  2  requires  the  optimization  algorithm  to  attain  a  superlinear  rate  of  conver¬ 
gence  for  sufficiently  large  n ,  which  is  typically  the  case  for  Newtonian  methods  applied  to 
strongly  convex  instance  of  (Pn)  with  twice  Lipschitz  continuously  differentiable  functions. 
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For  example,  the  Polak-Mayne-Higgins  Algorithm  (see  Algorithm  2.5.10  of  [8])  attains  a  su¬ 
per  linear  rate  of  order  7  =  3/2.  The  SQP  algorithm  of  [11]  also  achieves  a  superlinear  rate 
of  convergence,  but  its  order  appears  unknown.  Definition  2  requires  that  the  superlinear 
regime  starts  no  later  than  an  iteration  number  independent  of  N.  Assuming  that  the  algo¬ 
rithm  is  initiated  at  a  point  independent  of  N ,  this  is  obtained  in  the  Polak-Mayne-Higgins 
Algorithm  if  the  Lipschitz  constant  of  V^a.0(-,  •)  with  respect  to  its  first  argument  is  bounded 
on  X  x  Y  and  the  eigenvalues  of  X2£X(f)(x,  y)  for  all  x  G  X  and  y  G  Y  are  positive,  bounded 
from  above,  and  away  from  zero. 

The  next  lemma  identifies  a  total  error  bound  for  a  superlinearly  convergent  algorithm. 

Lemma  1.  Suppose  that  Assumption  1  holds  and  that  Msuper  is  a  superlinearly  convergent 
algorithm  with  order  7  G  (1,  00)  on  {(Pn)}™_-} y,  w'dh  N  as  in  Assumption  1.  Let  {xrfr}'fL0 
be  the  iterates  generated  by  Msuper  when  applied  to  ( PN ),  N  G  N,  N  >  N .  Suppose  also  that 
Msuper  satisfies  Assumption  2.  Then,  there  exist  constants  c  G  [0, 1),  n  G  [0,  00),  and  n  G  N 
such  that 

il){xnN)  -  0*  <  c7"k  +  LK/N 1/m 

for  all  n  >  n,  n  G  N,  and  N  >  N,  N  G  N,  where  L  and  K  are  as  in  Assumption  1  and  m 
is  the  uncertainty  dimension. 

Proof.  Based  on  Proposition  1  and  Definition  2,  there  exists  an  n  G  N  such  that 

H*nN)-r 

<  MxnN)  +  LK/NVm-i/>*N 

<  c-1/(7-1)(c1/(7“1)(0JV(^)  -  ^))7”^  +  LK/N1/m 

=  - rN)Vn  +  lk/n 1/m 

<  c-1/(7-1V7_Sp7”  +  LK/N1/171 

for  IV  >  N,  N  G  N,  and  n  >  n,  n  G  N,  with  p  as  in  Definition  2.  Consequently,  the 
conclusion  holds  with  c  =  p  and  k  =  c~x ! h'i~v) p<  71 .  □ 

In  view  of  Lemma  1,  we  adopt  the  upper  bound  on  the  optimization  error 

A”  (Msuper)  =  c^k, 

for  a  superlinearly  convergent  optimization  algorithm  Msuper  on  {(P/v)}/?_Ar,  where  c  and  k 
are  as  in  Lemma  1.  Consequently,  for  n,  IV  G  N,  we  dehne  the  total  error  bound 

e(Msuper,  n,  N )  =  c7"k  +  KL/N1/m. 

The  next  result  states  that  if  we  choose  a  particular  discretization  policy,  then  a  superlinearly 
convergent  optimization  algorithm  results  in  the  same  rate  of  convergence  of  the  total  error 
bound  as  a  finitely  convergent  algorithm.  Hence,  the  policy  stipulated  next  is  optimal  in  the 
sense  that  no  other  policy  guarantees  a  better  rate  of  convergence. 
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Theorem  2.  Suppose  that  *4.super  satisfies  the  assumptions  of  Lemma  1  and ,  in  addition, 
Assumption  3  holds.  If  {(nj,,  Nb)}^=l  is  an  asymptotically  admissible  discretization  policy 
with  nb/ log  log  b  — »  a  G  (1/  log  7,  00),  then 


lim 

6— >■  00 


log  e(*4.super,  nb,  Nb) 


log  b  mu  ’ 

where  u  is  as  defined  in  Assumption  3  and  m  is  the  uncertainty  dimension. 


Proof.  Straightforward  algebraic  manipulation  gives  that 
KL 


Nl/r- 


=  exp  f  log  KL - —  log 

\  mu  \  b 

(  1  ( nMN l 

=  exp  log  KL - log  - - — 

\  mu  \  b 

where  M  is  as  in  Assumption  3,  and 


1  bg  (  b 

mu  \  log  log  b 


/loglogt\ 
mu  \  nM  J 


1  1  7  ,  1  1  1  1  7  1  1  floS^gb\ 

- log  b  H - log  log  log  b - log  - — - 

mu  mu  mu  \  nM  ) 


hie1  =  exp  I  log  k  +  7 log  1os  b  log  log  b  log  c 


(n  - 

log  K  +  log  c(log  b )  log|og6 


Hence, 


e(Asuper,  n,  N) 

(  —1 

=  exp  - (log  b  —  log  log  b) 

\  mu 


1  (nMN1 

exp  log  KL - log  — 

mu  \  b 


_  J_lo  ■  /^ogiog b\ 

mu  ® 7  nM  J 


+  exp  (  log  k  +  log  b  ( - b  log  c(log  b )  log7  1  ) - log  log  log  b) 

mu  J  mu  J 


Consequently, 

log  e{Asupe\n,N) 


log  b 


1  1  log  log  b 


mu  mu  log  b 


(5) 


log 


1  (nMNv 

exp  log  KL - log  - - - 

mu  \  b 


J_lo  .  /loglog b\ 
mu  ®  nM  J 


/  /I  n  1  1  \  1 

+  exp  (  log  k  +  log  b  ( - b  log  c(log  b)  loglog b  log7_1  ] - log  log  log  b 

mu  )  mu 


/  l°g  b. 


Since  nbMNf/b  — >  1,  loglog b/nb  — >  1/a,  and,  due  to  the  facts  that  a  log 7  —  1  >  0  and 
logc  <  0,  logc(log&)loslos6  g7  — >  —00,  as  b  — >  00,  we  obtain  that  the  expression  in 
brackets  in  (5),  with  n  and  N  replaced  by  nb  and  Nb,  respectively,  tends  to  a  constant  as 
b  00.  The  conclusion  then  follows  from  taking  limits  of  the  other  terms  as  well.  □ 

It  is  clear  from  Theorem  2  and  its  proof  that  other  choices  of  discretization  policy  than 
the  one  recommended  may  result  in  significant  slower  rate  of  convergence  of  the  total  error 
bound  as  the  computing  budget  tends  to  infinity. 
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3.3  Linear  Optimization  Algorithm 

We  next  consider  a  linearly  convergent  optimization  algorithm  defined  as  follows. 

Definition  3.  An  optimization  algorithm  A  converges  linearly  on  {{Pn)}^-^  w^en  -^-n  * s 
nonempty  for  N  >  N  and  there  exist  constants  n  G  N  and  c  G  [0, 1)  such  that 

~Vn  <  - 
tPn(x%)  -  if*N  -  C 

for  all  n  >  n,  n  G  N,  and  N  >  N ,  N  G  N.  □ 

The  definition  requires  that  the  rate  of  convergence  coefficient  c  holds  for  all  N  suffi¬ 
ciently  large.  This  is  satisfied,  for  example,  in  the  PPP  algorithm  when  the  eigenvalues  of 
X'2xx<f(x,  y)  for  all  x  G  X  and  y  G  Y  are  positive,  bounded  from  above,  and  away  from  zero 
[8,  Section  2.4], 

Lemma  2.  Suppose  that  Assumption  1  holds  and  that  Almear  is  a  linearly  convergent  algo¬ 
rithm  on  {(P/v)}^_v,  with  N  as  in  Assumption  1.  Let  {a;^}^L0  be  the  iterates  generated 
by  Ahnear  when  applied  to  (Pn),  X  G  N,  IV  >  N.  Suppose  also  that  there  exists  a  constant 
C  G  R  such  that  ^n{x%)  <  C  for  all  n  G  N  and  N  >  N,  N  G  M,  and  that  Alinear  satisfies 
Assumption  2.  Then,  there  exists  a  constant  k  G  [0,  cxd)  such  that 

if(xnN)  -  if*  <  cnK  +  LK/Nl,m 

for  all  n  >  n  and  N  >  N,  where  c  and  n  are  as  in  Definition  3,  and  K  and  L  are  as  in 
Assumption  1. 

Proof.  Based  on  Proposition  1  and  the  fact  that  vAlmear  is  linearly  convergent,  we  obtain 
that 

<  MxN)  +  LK/NVm-if>*N 

<  (p-^ixl)  -  rN]  +  LK/N V- 

<  cn(c~n(C  -if*  +  LK/N1/m))  +  LK/N 1/m. 

Hence,  the  results  hold  with  k  =  ( c~n{C  —  if*  +  LK/ Nl^m)).  □ 

We  note  that  the  assumption  ifjsr^x/y)  <  C  for  all  n  G  N  and  N  G  N,  N  >  N,  in  Lemma 
2  is  rather  weak  and  is  satisfied  for  example  if  the  optimization  algorithm  starts  with  rr°  G  X 
regardless  of  N  and  is  a  descent  algorithm  because  then  ifxix'ff)  <  if^^x0)  <  if(x°).  In  view 
of  Lemma  2,  we  define  the  optimization  error  bound  for  a  linearly  convergence  optimization 
algorithm  Almear  to  be 

A”  (Alinear)  =  c’X 

where  c  and  k  are  as  in  Lemma  2,  and  the  total  error  bound  for  n,  N  G  N  to  be 

e(Alinear,  n,  N )  =  cnn  +  LK/N1/m. 


12 


The  next  result  states  that  a  linearly  convergent  optimization  algorithm  also  attains  the 
best  possible  rate  of  convergence  of  the  total  error  bound  given  in  Theorems  1  and  2  under 
a  suitable  choice  of  {(nb,  Nb)}'jffl. 


Theorem  3.  Suppose  that  Almear  satisfies  the  assumptions  of  Lemma  2  and,  in  addition, 
Assumption  3  holds.  If  {(nb,  Nb)}'jffl  is  an  asymptotically  admissible  discretization  policy  with 
rift /  log  b  — >  a  >  (— 1  / (mu  log  c) ,  00) ,  where  c  and  u  are  as  in  Definition  3  and  Assumption 
3,  respectively,  then 

log  e(Alinear,  rift,  Nb)  1 
b— >00  log  b  mu 

Proof.  Algebraic  manipulations  give  that 


KL 

Nl/m 


exp  log  K L - log 

mu 


nM  NL 


mu  y  nM  J 


- log  b  + 

mu 


- log  log  b), 

mu  j 


where  M  is  as  in  Assumption  3,  and 


cnn  =  exp 


log  K  +  n  log  c  )  =  exp  (  log  k  +  log  b(nj  log  b )  log  c 


Hence, 


e(Alinear,  n,  N ) 

exp  ( - (log  b  —  log  log  b ) 

\mu 

- log  log  b  J  +  exp  (  log  KL - log 

mu  )  V  mu 


exp  (  log  n  + 


77/  1 

- — -  log  cH - )  log  b 

log  b  mu. 


(6) 


nMNh 


—  iog(A^ 

mu  \  nM 


Since  a  >  —l/(mu\ogc),  nb  log c/  log b  +  1  /{mu)  — >  alogc+  1  /(mu)  <  0,  as  b  — »  00. 
Consequently,  the  expression  in  the  brackets  in  (6),  with  n  and  N  replaced  by  nb  and  Nb, 
respectively,  tends  to  exp(log  KL  —  (1  /{mu))  log(l/(aM))),  as  b  — »  00.  The  conclusion  then 
follows  from  (6)  after  taking  logarithms,  dividing  by  log  b,  and  taking  limits.  □ 


3.4  Sublinear  Optimization  Algorithm 

We  next  consider  the  situation  when  the  optimization  algorithm  for  solving  (Pn)  is  sublin- 
early  convergent  as  given  in  the  following  definition. 

Definition  4.  An  optimization  algorithm  A  converges  sublinearly  with  degree  7  €  (0, 00)  on 
{(.P/v)}jy_]Y  when  X*N  is  nonempty  for  N  >  N  and  there  exists  a  constant  C  G  [0,  00 )  such 
that 

c 7 nl 

for  all  n  e  N  and  N  >  N,  N  e  N.  □ 
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The  subgradient  method  is  sublinearly  convergent  in  the  sense  of  Definition  4  with 
7  =  1/2  and  C  =  DXL $  when  (Px)  is  convex,  where  Dx  is  the  diameter  of  X  and  L $ 
is  a  Lipschitz  constant  of  <j>(-,y)  on  X  independent  of  y  G  Y\  see  [28,  pp.  142-143].  In 
view  of  Definition  4,  we  define  the  optimization  error  bound  for  a  sublinearly  convergence 
optimization  algorithm  *4.sublm  to  be 

A^(4sublin)  =  C/rP 

and  the  total  error  bound  for  n,  N  e  N  to  be 

e(v4.sublin,  n,  N )  =  C/n>  +  LK/N1/m. 


The  next  result  gives  an  optimal  discretization  policy  for  a  sublinearly  convergent  opti¬ 
mization  algorithm  and  also  shows  the  corresponding  rate  of  convergence  of  the  total  error 
bound. 


Theorem  4.  Suppose  that  Assumption  1  holds  and  that  v4sublin  is  a  sublinearly  convergent 
algorithm  with  degree  7  G  (0,  00)  on  {(-Pv)}/Lv,  with  N  as  in  Assumption  1.  Suppose 
also  that  *4.sublm  satisfies  Assumptions  2  and  3,  and  that  {(rib,  Nb)}(£=l  is  an  asymptotically 
admissible  discretization  policy.  Then, 


.  loge(^sublm,nfe,  Nb) 

lim  mt - - - - - 

b— >00  log  b 


> 


1 

mu  +  1/7’ 


where  u  is  as  in  Assumption  3  and  m  is  the  uncertainty  dimension. 
Moreover,  if  nb/b1^muj+1^  — >•  a  e  (0, 00),  as  b  — >■  00,  then 


log  e(^lsublin,  nb,  Nb)  1 

hm  - - - - - = - — . 

b — ^00  log  b  mu  +  I/7 

Proof.  For  any  n,  N  e  N, 


log  e(^4sublin,  n,  N)  = 
> 


log(C/n7  +  KL/N1/m) 
log(max{C/n7,  KL/N1/m}) 


=  max{logC  —  7  log  n,  log  AT  —  (1/m)  logiV}. 


Let  {(rib,  Arfe)}^fi1  be  an  arbitrary  asymptotically  admissible  discretization  policy.  If  > 
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&i/07U'7+i),  then 

log  e(*4.sublin,  nb,  Nb)  log  KL  -  ±  log  Nb 


log  b 


> 


log  b 


log  KL  -  j-  log 


1  ( Nbnb  b 


b  nb 


l/v 


log  b 


> 


log  KL  -  M  log  (^bbmV 7/(^7+!)^ 


log  b 


log  KL  -  log  ( -  JL  log  lfini/{mu1+l) 
0  mu  °  V  b  )  mu  ° 


log  b 


log  KL - —  log  1  — 

°  mu  ° 


NYnb 


log  6  mz/  +  I/7 

If  nb  <  b1/(mL"y+1\  then 

log  e(*4.sublm,  nb,  Nb)  log  C  —  7  log nb 


log  b 


> 


> 


log  b 

log  C  —  7  l°g  61/(m"7+1) 
log  6 

logC  1 


log  b  rrw  +  I/7 


Hence,  for  any  b  G  N, 

log  e(A.sublin,  nb,  Nb) 
log  b 


>  min 


\ogKL-  ^log 
log  6 


nb 


log  g 

log  6  (  mzz  +  I/7 


The  first  result  then  follows  by  taking  limits  as  6  — >  00,  utilizing  the  fact  that  N^nb/b  — >  1/M, 
as  6  — >  00,  where  M  is  as  in  Assumption  3. 

Next,  let  {(nb,Nb)}™=1  be  an  asymptotically  admissible  discretization  policy  satisfying 
nb/bl^mv'l+V)  — >  a  G  (0,  00),  as  b  — »  00.  Then,  by  algebraic  manipulation, 


e(Asublin,n&,7Vft)  =  ^  + 


C  KL 


nb  Nl,m 


=  C 


nb 


+  Kl( -L-)  1/<”"')  C  "*  )  1Rm-]  \  6-V<™+./,) 

I  n;;„J  W,i,w,-i;J  I 


Since 


C 


^l/(mi/7+l) 

nb 


/  b  \  l/(mU)  /  72;,  \  l/(mi/) 

+  ( jv^)  (^Any)  -  o/a  +  A:L(J1/.)VC~), 
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as  b  — »  oo,  where  M  as  in  Assumption  3,  and 


loge(Asublin,  rib,  Nb) 
log  b 


=  log  C 


+  KL 


rib 


Nunb 


1  /(mu) 


rib  \ 


^l/ (mi/'y+l) 


/  log  6 


+  — 1/ (mis  +  I/7), 


the  second  part  of  the  theorem  follows  after  taking  limits  as  b  — >  00.  □ 

We  see  from  Theorem  4  that  the  rate  of  convergence  of  the  total  error  bound  in  the 
case  of  a  sublinearly  convergent  optimization  algorithm  is  apparently  worse  than  the  best 
possible  achievable  by  finite,  superlinear,  and  linear  algorithms  (see  Theorems  1,  2,  and  3), 
even  for  the  optimal  choice  of  discretization  policy  given  by  the  second  part  of  the  theorem. 
Hence,  there  is  a  nontrivial  computational  cost  of  optimization  in  this  case.  As  expected,  if  7 
tends  to  infinity,  then  the  rate  in  the  sublinear  case,  under  the  optimal  discretization  policy, 
tends  to  that  of  the  finite,  superlinear,  and  linear  cases.  We  note  however  that  is  is  typically 
smaller  in  the  case  of  a  sublinear  algorithm  than  for  superlinear  and  linear  algorithms;  see 
the  discussion  after  Assumption  3.  For  example,  in  the  case  of  the  subgradient  method, 
is  —  1,  and,  since  7  =  1/2  in  that  case,  we  obtain  from  Theorem  4  a  rate  of  convergence 
of  the  total  error  bound  of  &-1/(m+2).  In  contrast,  for  a  linearly  convergent  optimization 
algorithm  with  is  =  1.5,  we  obtain  a  rate  of  convergence  of  the  total  error  bound  of  5~2/(3m). 
Hence,  for  all  uncertainty  dimensions  m  <  4,  the  linear  optimization  algorithm  results  in  a 
better  rate  of  convergence  than  the  sublinear  algorithm.  For  m  =  4,  the  rates  are  the  same. 
For  larger  m,  the  sublinear  algorithm  obtains  the  better  rate.  Consequently,  the  results  of 
this  section  indicate  that  the  intuitive  inclination  of  using  a  superlinear  or  linear  algorithm 
instead  of  a  sublinear  one  within  a  discretization  algorithm  may  not  always  be  supported  by 
the  above  analysis.  The  next  section  examines  one  particular  optimization  algorithm  based 
on  exponential  smoothing  that  behaves  similarly  to  a  sublinear  algorithm. 


4  Smoothing  Optimization  Algorithm 

In  this  section,  we  consider  an  optimization  algorithm  for  solving  (Pn)  based  on  exponential 
smoothing  of  iPn(-)-  Instead  of  solving  (Pn)  directly  using  a  finite  minimax  algorithm  as 
discussed  in  the  previous  section,  the  exponential  smoothing  algorithm  solves  (Pn)  by  solving 
the  smooth  approximate  problem 


(PNp)  min  ip Np(x), 

xGX 


where  p  >  0  is  a  smoothing  parameter  and 


ipNp(x)  =  -log  (  exp  (piPix,  y))  j  . 


P 


VyeViv 


(7) 
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The  function  V’tvp ( ' )  is  a  smooth  approximation  of  '0tv(-)  first  proposed  in  [29]  and  examined 
in  [12,  27,  30-33]  for  solving  finite  minimax  problem.  It  is  well-known  that 

o  <  ^Np(x)  -  ipN{x)  <  log  N/p,  (8) 

for  all  x  G  Md,  IVeN,  and  p  >  0;  see  for  example  [12],  Consequently,  a  near-optimal  solution 
of  (Pat)  can  be  obtained  by  solving  (Pnp)  for  a  sufficiently  large  p.  A  main  advantage  of 
the  smoothing  algorithm  is  that  when  (p{-,y )  is  smooth  for  all  y  G  Y/v,  ipNp{-)  is  smooth 
and  ( -P/vp )  is  solvable  by  unconstrained  smooth  optimization  algorithms  (if  X  =  or  by 
smooth  optimization  algorithm  for  simple  constraints  (if  X  C  Md).  Hence,  the  smoothing 
algorithm  avoids  solving  large-scale  quadratic  programs  as  in  the  case  of  SQP  and  PPP 
minimax  algorithms  (see  for  example  [11]  and  [8,  Section  2.4]).  In  fact,  each  iteration  of  the 
smoothing  algorithm  may  only  require  the  evaluation  of  </>(•,  y )  and  Xx(f>{-,  y),  y  G  YN ,  at  the 
current  iterate  (and  at  line  search  points),  which  imposes  a  computational  cost  proportional 
to  N  per  iteration.  Hence,  it  is  reasonable  to  assume  that  v  —  1  in  Assumption  3  for  the 
smoothing  algorithm. 

Specifically,  for  a  given  #  G  N,  we  consider  the  following  smoothing  algorithm  for  solv¬ 
ing  (PN): 

Optimization  Algorithm  Asmooth  for  Solving  (P/v). 

Data.  n  G  N  and  p  >  0. 

Step  1.  Construct  iterates  {xlNp\^=Q  C  by  applying  n  iterations  of  an  optimization  algo¬ 
rithm  to  (P/Vp)-  □ 

This  simple  smoothing  algorithm  Asmooth  can  be  extended  to  include  adaptive  adjust¬ 
ment  of  the  smoothing  parameter  p  (see  for  example  [12]),  but  we  here  focus  on  Asmooth. 

Discretization  of  Y  combined  with  exponential  smoothing  for  the  solution  (P)  is  pro¬ 
posed  in  [34],  where  proof  of  convergence  is  provided,  but  without  a  rate  of  convergence 
analysis.  In  this  section,  we  determine  the  rate  of  convergence  of  this  approach.  Specifically, 
we  consider  the  solution  of  (P)  by  discretization  of  Y,  as  in  the  previous  sections,  followed 
by  the  application  of  Asmooth  to  (P/v).  While  we  above  consider  discretization  policies  of  the 
form  {{rib,  A b)}^,  we  now  also  need  to  consider  a  smoothing  policy  {pb}bLi,  with  pb  >  0,  for 
all  b  G  N.  A  smoothing  policy  specifies  the  smoothing  parameter  to  be  used  in  Asmooth  given 
a  particular  computing  budget  b.  The  discretization  policy  gives  the  number  of  iterations  to 
carry  out  in  Asmooth  as  well  as  the  level  of  discretization. 

We  assume  that  Assumption  3  holds  for  Asmooth  regardless  of  p,  i.e.,  the  computational 
work  to  carry  out  n  iteration  of  Asmooth  is  independent  of  p.  In  view  of  (7),  the  value  of  p 
does  not  influence  the  work  to  compute  ipNP(x)  and  its  gradient  and  hence  this  assumption 
is  reasonable.  However,  as  shown  empirically  in  [32]  and  analytically  in  [12],  a  large  value  of 
p  results  in  ill-conditioning  of  (Pvp)  and  slow  rate  of  convergence  of  optimization  algorithms 
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applied  to  that  problem.  We  adopt  the  following  assumption,  which,  in  part,  is  motivated 
by  results  in  [12]  as  discussed  subsequently. 

Assumption  4.  Suppose  that  there  exists  an  N  G  N  such  that  if  {xlNp}f=0  is  constructed 
by  optimization  algorithm  Asmooth  with  data  n  G  N  and  p  >  0  when  applied  to  (Pn),  N  G 
N,  N  >  N,  then  the  following  holds: 

(i)  xlNp  G  X  for  all  i  =  0,1, ...,  n,  N  E  N,  N  >  N,  arid  p  >  0, 

(ii)  X*N  is  nonempty  for  N  G  N,  N  >  N,  and 

(in)  there  exist  constants  k  G  (0, 1)  and  k  G  [0,  oo)  such  that 

^N(xnNp)  ~  lf*N  <  (l  ~  K+  2l0^N  (9) 

for  any  n,N  G  N,  N  >  N  and  p  >  1.  □ 

Part  (i)  of  Assumption  4  requires  that  Algorithm  Asmooth  generates  feasible  iterates, 
which  is  easily  achieved  since  A"  is  either  or  a  simple  closed  convex  subset.  Part  (iii)  is 
stronger  and  stipulates  that  the  “optimization  error”  after  executing  Algorithm  Asmooth  is 
bounded  by  the  sum  of  two  terms.  The  first  term  bounds  the  error  caused  by  “incomplete” 
optimization  and  vanishes  as  n  — >  oo.  The  second  term  bounds  the  smoothing  error  and 
tends  to  zero  as  p  — »  oo;  see  (8).  For  a  fixed  p  >  1,  the  first  term  indicates  a  linear  rate  of 
convergence  as  n  — »  oo.  However,  the  rate  of  convergence  coefficient  tends  to  1  as  p  grows, 
reflecting  the  increasing  ill-conditioning  of  (P/vp).  Hence,  Algorithm  Asmooth  may  converge 
only  sublinearly  if  p  — >  oo.  If  Step  1  of  Algorithm  Asmooth  utilizes  the  steepest  descent 
or  projected  gradient  methods  to  solve  ( Pnp ),  then  Assumption  4  holds  under  standard 
assumptions  as  stated  next. 

Proposition  2.  Suppose  that  (i)  </>(•,  •)  is  twice  continuously  differentiable  on  X  x  Y,  (ii) 
there  exists  a  constant  X  G  (0,  oo)  such  that 

AINU2  <  (z,vlx(Kx,y)z), 

for  all  x  G  X,  z  G  R.d,  and  y  G  Y ,  (iii)  Step  1  of  Algorithm  Asmooth  utilizes  either  the 
steepest  descent  method  with  Armijo  step  size  rule  (see  Algorithm  1.3.3  in  [8])  if  X  =  Rd  or 
otherwise  the  projected  gradient  method  with  Armijo  step  size  rule  (see  Algorithm  1.3.16  in 
[8]),  (iv)  there  exists  a  constant  C  G  [0,  oo)  such  that  the  initial  iterate  x°Np  G  X  of  Step  1 
of  Algorithm  Asmooth  satisfies  f>(x°Np)  <  C  for  all  N  G  N  and  p  >  0,  and  (v)  Assumption  1 
holds.  Then,  Assumption  f  holds  with  N  as  in  Assumption  1. 

Proof:  Part  (i)  of  Assumption  4  follows  trivially  by  the  choice  of  optimization  algorithm  in 
Step  1  of  Algorithm  Asmooth.  Part  (ii)  of  Assumption  4  is  a  direct  consequence  of  Assumption 
1.  We  next  consider  part  (iii). 

Using  the  same  arguments  as  in  Lemma  3.1  of  [12],  we  obtain  that  NiVpl')  is  twice 
continuously  differentiable  and 

A|N||2  <  (z,  X2ifNp(x)z)  ,  (10) 
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for  any  x  G  X,  G  Md,  N  G  N,  and  p  >  0.  Moreover,  a  slight  generalization  of  Lemma  3.2  in 
[12]  yields  that  for  every  bounded  set  S  Cl,  there  exists  an  Mg  <  oo  such  that 

(z,V2ipNp(x)z)  <pMs\\z\\2,  (11) 


for  all  x  G  S,z  G  Md,  iVeN,  and  p  >  1. 

The  steepest  descent  method  with  Armijo  step  size  rule  and  the  projected  gradient 
method  with  Armijo  step  size  rule  have  linear  rate  of  convergence  in  function  values  under 
strong  convexity  with  rate  coefficient  1  —  £Amin/Amax,  where  £  G  (0, 1)  (which  depends  on  the 
method)  and  Amax  >  Amin  >  0  are  upper  and  lower  bounds  on  the  eigenvalues  of  the  Hessian 
of  the  objective  function  on  a  sufficiently  large  subset  of  WLd;  see  Theorems  1.3.7  and  1.3.18 
in  [8].  Hence,  in  view  of  (10)  and  (11),  pMg  and  A  provide  these  upper  and  lower  bounds  in 
the  case  of  (Pnp)  and  therefore 

^NP(xnN+p)  ~  1p*Np  <  ^1  -  ^  i^Np{xnNp)  ~  1p*Np) 
for  all  n,  N  G  N  and  p  >  1,  with  k  =  £A/Mg  G  (0, 1).  From  (8),  we  then  obtain  that 


-  ^*n  < 

< 

< 

< 


Np{xnNp )  -  VNp  +  log  N/p 
k\n 

1  -  -J  i^Np(x°Np)  ~  tp *Np )  + 


log  AT 


k_ " n 
p , 


1  -  -  )  &n(xNp)  -  lp*N)  + 


p 

2  log  AT 


P 


k x  n 
P, 


1-z)  (iP(Ap)~r  +  LK)  + 


2  log  N 
P 


for  all  n,  N  gN  and  p  >  1,  where  we  use  the  fact  that  <  —ip*  +  LK  for  all  N  >  N, 
N  G  N,  in  view  of  Proposition  1.  Since  we  assume  that  ip(x  °Np)  <  C  for  all  AgN  and  p  >  0, 
the  conclusion  follows  with  n  =  C  —  ip*  +  LK.  □ 

We  note  that  assumption  (iv)  in  Proposition  2  is  rather  weak  and  is  satisfied,  for  example, 
if  the  optimization  algorithm  used  to  solve  ( Pnp )  in  Step  1  of  Algorithm  Msmooth  is  initialized 
with  the  same  iterate  regardless  of  A  G  N  and  p  >  0.  The  next  result  gives  a  total  error 
bound  for  Algorithm  Msmooth  under  Assumption  4. 


Lemma  3.  Suppose  that  Assumptions  1  and  4  hold.  If  {xrffp}'^>=0  is  generated  by  Algorithm 
Msmooth,  then 


*P(xnN)  -r< 


f.  k\n  LK 
\  p )  K  +  N1/171 


+ 


2  log  AT 
P 


for  all  n,  N  G  N,  N  >  N  and  p  >  1,  where  N ,  k,  and  k  are  as  in  Assumption  4  and  L  and 
K  as  in  Assumption  1. 


Proof.  The  conclusion  follows  directly  from  Proposition  1  and  Assumption  4. 


□ 
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In  view  of  Lemma  3,  we  define  the  optimization  error  bound  for  Algorithm  y[smooth  to 


A  nNp(A 


smooth  \  ^ 


V  pJ  v 


and  the  total  error  bound  for  n,  N  E  N  and  p  >  0  to  be 

,  ,  smooth  at  \  A  ( i  LK  21oglV 

e  ^smooth  N  p)  =  (  1 - K  +  — r-  H - 5 . 

v  \  p)  Nl/m  p 

Before  we  proceed  with  the  main  result  of  this  section,  we  need  the  following  trivial  fact. 

Lemma  4.  For  x  E  [0, 1/2],  —2x  <  log(l  —  x)  <  —x.  □ 

Theorem  5.  Suppose  that  Assumptions  1,  3,  and  4  hold  and  that  {(nb,  Nb)}bL1  is  an  asymp¬ 
totically  admissible  discretization  policy  and  {pb}//Ll  is  a  smoothing  policy  with  pb  >  1  for  all 
b  gN.  Then, 

.  Aoge(Asmooth,nb,Nb,pb)  ^  1 

oo  log  b  mu  +  1 

where  u  is  as  defined  in  Assumption  3  and  m  is  the  uncertainty  dimension. 

Moreover,  if  Pb/b5a  — >■  a  E  (0,  oo),  with  5  E  (0, 1)  and  a  =  1  /(5mu  +  1),  and  nb/ba 
a'  E  (0,  oo),  then 

Um  log  e(^lsmooth,  nb,  Nb,Pb)  =  1 

b->  oo  log  b  mu +  1/5 

Proof.  We  first  consider  part  one.  If  Nb  is  bounded  as  b  — »  oo,  then  e(^4smooth,  nb,  Nb,pb) 
does  no  vanish  as  b  — »  oo  and  the  conclusion  of  part  one  follows  trivially.  Hence,  suppose 
there  exists  a  bo  E  N  such  that  Nb  >  3  for  all  b  >  bo-  Then,  algebraic  manipulations  and 
Lemma  4  give  that  for  b  >  bo, 

log  e(^4smooth,  nb,  Nb,Pb) 

—  l0g('en(,log(1-fe/p6)+logK  +  e-{i-/m)logNb+\ogLK  e-logp6+loglog Vi,+log2^  Qg) 

>  log(e_2fcn6A6+logK  +  e_d/m)  loglV^+logLA'  _j_  g  —  log p;, +log log  N ), +log 2 ^ 

>  log(max{e_2fcni,Ai>+logK  e ~(l/m)logNb+logLK  g-logp^+loglog V6+log2|^ 

=  max{— 2 knb/pb  +  log  k,  -(1/m)  log  Nb  +  log  LK,  -  log pb  +  log  log  Nb  +  log  2}. 

We  consider  three  cases.  First,  if  nb  >  bl^mv+1\  b  >  bo,  then 


log  e(Asmooth,nb,Nb,pb) 


—  (1/m)  log  Nb  +  log  LK 

log  b 

—  (1/ (mu))  log (Njfnb/b)  -  (1  /(mu))  log (b/nb)  +  log  LK 

log  b 

—  jl/fmu))  log (N£nh/b)  -  (1/H)  iog5W(^+i)  +  \ogLK 

log  b 

-(1/ (mu))  \og(Njfnb/b)  _  1  log  LK 

log  b  mu  +  1  log  b 
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Second,  if  n  <  61/(mi'+1)  and  p  <  b1^mv+1\  b  >  b0,  then 


log  e(Asmooth}nb,NblPb)  > 
log  b 

> 

> 


-  log  pb  +  log  log  Nb  +  log  2 
log  b 

-log  Pb 
log  b 
1 

mu  +  1 


Third,  if  n  <  bl^mu+1^  and  p  >  bl^mv+l\  b  >  bo,  then 

log  e(^4smooth,  nb,  Nb,  pb)  >  -2 knb/pb  +  log  k 

log  b  ~  log  b 

—2k  +  logK 
log  b 

Hence,  for  any  e  >  0,  there  exists  a  b\  >  b0  such  that 


log  e(Asmooth,nb,NblPb)  >  1 

log  b  ~  mu  +  1 

for  all  b  G  N,  b  >  b\.  Since  e  is  arbitrary,  the  conclusion  of  part  one  follows. 

We  next  consider  part  two.  Let  bo  G  Id  be  such  that  Nb  >  3  for  all  b  e  N,  b  >  bo-  For 
b  >  b0,  we  define 


A 


e(Asmooth,nb,Nb,pb)  =  exp 


2  knb 
Pb 


+  log  k  I  +  exp  ( - log  Nb  +  log  LK 


m 


exp 


log  pb  +  log  log  Nb  +  log  2  ) . 


We  dehne  e(*4.smooth,  nb,  Nb,pb )  identically  except  with  2k  replaced  by  k.  Then,  using  Lemma 
4  and  similar  arguments  as  in  (13),  we  obtain  that 

e(Asmooth,nb,Nb,pb)  <  e(Asmooth,  nb,  Nb,pb)  <  e(Asmooth,  nb,  Nb,pb)  (14) 


for  all  b  G  N,  b  >  b0.  We  next  consider  e(Asmooth,  nb,  Nb,  pb)  and  find  that 


,5a 


=  -k^—bA  i-5) 
pb  ba  pb 


1  ,  AT  l- a,  ,  1, 

-  —  log  Nb  — - log  b - log 

m  mv  m 


Nbnb 


l/v 


Pb 


1  fba\ l/v 

—  log  (  —  , 

m  \  rife  J 


log  ph  =  5 o.  log  b  log  ^5^ , 


and 


log  log  Nb  =  log  log  b  +  log 


\og(yNbnb/b)1/u  log  {ba/nb)ltv  1  —  a 


log  b 


log  b 
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for  all  b  G  N,  b  >  bo.  Using  the  above  expressions,  we  obtain  that  for  all  b  G  N,  b  >  bo, 


e(Asmooth,nb,  Nb,pb)  —  exp  - 


5 


Smu  +  1 


log 6  (Ti(6)  +  T2(6)  +  T3(6)), 


(15) 


where 


Tl(6)  A  exp  (  -  k^-b log  6  +  log  «) , 


A 


T2(b)  =  exp 


a  —  1 


Nbnb 


8 


log  b - log  — - - log - h  log  LK  + 

mu  mu  b  mu  nb  omu  + 1 


log  6  , 


and 


A 


T:i(b)  =  exp  (  —  8a  log  6  4- 
log 


- — - — -  log  b  -  log  +  log  log  b 

8  mu  +  1  bSa 

log {N%nb/b)1/v  +  log (ba/nb)1/u  +  1-a  |  +  log2 


log  b 


log  b 


u 


Since  nb/ba  a',  bSa/pb  — >  1/a,  as  b  — »  oo,  a  =  1  /(Smu  +  1),  and  5  G  (0, 1),  we  obtain  that 
T\  (b)  — >  0  as  b  — »  oo.  We  also  obtain  that 


/  1  NYru.  16"  \ 

T2(b)  =  exp  ( - log  — - - log - h  log  LK  ) 

\  mu  b  mu  nb  J 

->■  exP  (  " — —  log  —  log  —  +  log  LK  j , 

\  mu  M  mu  a '  J 

as  b  — »  oo,  where  M  is  as  in  Assumption  3.  Moreover,  we  find  that  there  exist  constants 
6i  >  b0  and  C  G  [0,  cxd)  such  that 


Ts(b) 


=  exp 


_  los  + los  los b + los 


log  (Nj/nh/b)1/11  log  {ba /nb)1^  1  —  a 

~b  i  7  S- 


log  b 


log  b 


+  log  2 


<  Celoglogb  =  Clog  6 


for  all  b  >  b\,  b  G  N.  Consequently,  there  exist  constants  C'  G  (C,  oo)  and  b2  G  N,i)2  >  b2, 
such  that  for  all  b  >  b2, 

T1(6)  +  r2(6)  +  r3(6)<C'log6 


for  all  b  G  N,  b  >  b2.  Hence,  for  b  >  b2, 

log  e{Asmooth,nb,Nb,pb)  log(e-5/(w+1)logbC"  log  6) 
log  b  ~  log  b 

8  C'  log  log  b  1 

Smu  +  1  log  b  log  b  mu  +  1/8  ’ 

as  b  — »  oo.  Repeating  the  same  argument  for  e(ASTnooth ,  nb,  Nb,  pb) ,  we  obtain  that 


lim  inf 

6— >  oo 


log  e(Asmooth,nb,Nb,pb) 
log  b 


> 


1 

mu  +  1/8 
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Hence,  the  conclusion  of  part  two  of  the  theorem  follows  from  (14).  □ 

We  see  from  Theorem  5  that  Algorithm  yfsmooth  is  competitive  with  any  sublinear  opti¬ 
mization  algorithm  of  degree  7  G  (0,1]  (such  as  the  subgradient  method  with  7  =  1/2)  as 
5  G  (0, 1)  can  be  selected  arbitrarily  close  to  one.  While  the  best  possible  rate  of  ft-1/!™') 
is  not  attainable  even  for  the  optimal  discretization  and  smoothing  policy  specified  in  The¬ 
orem  5,  Algorithm  ypmooth  has  v  —  1  and  therefore  may  still  be  competitive  under  certain 
circumstances. 

5  Conclusions 

In  this  paper,  we  examined  the  rate  of  convergence  of  discretization  algorithms  for  semi¬ 
infinite  minimax  problems  as  a  computing  budget  b  tends  to  infinity.  These  algorithms 
approximately  solve  finite  minimax  problems  as  subproblcms  and  we  study  the  rates  resulting 
from  the  use  of  various  classes  of  optimization  algorithms  for  this  purpose.  We  find  that  in  the 
case  of  superlinear  and  linear  optimization  algorithms,  the  best  possible  rate  of  convergence 
is  b~1^mv\  where  m  is  the  uncertainty  dimension  in  the  semi- infinite  minimax  problem  and 
v  is  a  positive  parameter  related  to  the  computational  work  per  iteration  in  the  optimization 
algorithms.  The  best  rate  is  attained  with  a  particular  optimal  discretization  policy  identified 
in  the  paper  and  cannot  be  improved  upon  due  to  the  unavoidable  discretization  error. 
Other  policies  may  result  in  substantially  slower  rates.  In  the  case  of  sublinear  optimization 
algorithms,  with  optimization  error  of  order  0(l/n7),  7  >  0,  after  n  iterations,  the  best 
possible  rate  of  convergence  is  b~l^mi,+l^\  which  is  attained  using  an  optimal  discretization 
policy  constructed  in  the  paper.  If  a  smoothing  optimization  algorithm  solves  the  finite 
minimax  problems,  then  the  best  possible  rate  of  convergence  is  b~l^mi,+l\  which  one  can 
get  arbitrarily  close  to  using  a  specific  discretization  and  smoothing  policy. 

The  algorithm  parameter  v  varies;  superlinear  and  linear  finite  minimax  algorithms  may 
have  v  =  1.5  and  sublinear  and  smoothing  algorithms  v  =  1.  Consequently,  under  these 
assumptions,  a  sublinear  algorithm  with  7  =  1/2  as  in  the  case  of  the  subgradient  method 
obtains  a  rate  of  convergence  of  b^1^m+2\  which  is  better  than  5~2/(3m)  obtained  by  super- 
linear  and  linear  algorithms  for  m  >  4.  For  m  =  4,  the  rates  are  identical.  The  smoothing 
algorithm  obtains  essentially  b~l^m+1\  which  is  better  than  superlinear  and  linear  algorithms 
for  m  >  2.  For  m  —  2  the  rates  are  identical.  The  analysis  of  this  paper  therefore  indicates 
that  inexpensive  sublinear  and,  in  particular,  smoothing  algorithms  may  be  preferred  to  solve 
the  large-scale  finite  minimax  problems  arising  in  discretization  algorithms  for  semi-infinite 
minimax  problems. 
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